Volatility conditional on price trends 



Gilles Zumbach 



December, 2004 



Abstract 



The influence of the past price behaviour on the realized volatility is investigated in the present 
article. The results show that trending (drifting) prices lead to increased (decreased) realized 
volatility. This "volatility induced by trend" constitutes a new stylized fact. The past price be- 
haviour is measured by a product of 2 non overlapping returns, of the form r-L[r] where L is the 
lag operator. The effect is studied empirically using USD/CHF foreign exchange data, in a large 
range of time horizons. A set of ARCH based processes are modified in order to include the trend 
effect, and their forecasting performances are compared. For a better forecast, it is shown that the 
main factor is the shape of the memory kernel (i.e. power law), and the following factor is the 
inclusion of the trend effect. 
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1 Introduction 



The extensive study of the volatihty of financial time series starts 20 years ago with the seminal 
paper of Engle [?]. After a rapid improvement with the GARCH(1,1) process [?, ?], the quan- 
titative results obtained since then have not been much better, despite the very large number of 
studies with various volatility processes. Even though we have today a much better understanding 
of the financial markets volatility, particularly in the high-frequency domain (see e.g. in [?] and 
references therein), it remains difficult to translate this knowledge into better processes or better 
volatility forecasts. 

In order to overcome the apparent limitation of the classical processes, Zumbach, Pictet and Ma- 
sutti [?] launched a study using genetic programming (GP). Their work focused on improving the 
efficiency of the GP in order to turn it in to a practical tool to investigate financial time series, with 
an application to volatility forecast. One important advantage of the GP is that it is not biased 
by our a priori bais and knowledge, as the program searches in the whole space of models (yet 
not very efficiently). Indeed, in [?] the GP very quickly rediscovered in essence the GARCH(I,I) 
model, and then, with more time, was able to obtain better solutions. The analysis of the solutions 
discovered by the program showed that the new terms leading to the improvement werw of the 
form of a product of returns at two different time horizons. This can be expressed in a sum of 2 
terms, one with a return square, as in most volatility processes, and one with two non overlapping 
returns. This term is like r[5fr](f) • ?'[5f^](f — dtr), namely at time t, a product of a return at time 
horizon 5tr with a return at another time horizon 8f^, lagged by 5tr so as to not overlap the first 
return. In short, we denote generically such terms as r-L[r], where L[r] denotes the lagged return. 
Notice that such terms are even in the return, namely under the change r —r, the term r-L[r\ 
does not change its sign. 

This new term can be interpreted as a measure of the past price moves, namely whether the market 
is trending or drifting. The action of the "trend term" r-L[r] can be understood as follows. If 
both returns have the same sign (both positive or both negative), the market is trending, and this 
may induce the market participants to change their positions because of the price move. The 
trading of their positions increases volatility in the subsequent period. If the market is drifting, 
the two returns have different signs, and the unchanged price makes the market participants to 
keep their positions. This decreases the volatility in the subsequent period. This behaviour of 
the market participants creates a positive correlation with the realized volatility in the following 
period. Indeed, the correlation of the r ■ L[r\ term with the realized volatility, namely with the 
volatility computed after t, is positive (see below). This "volatility induced by trend" is a new 
stylized fact for financial time series. Notice that if a return differs by the sign but has the same 
magnitude, its contribution to the historical volatility is identical. Therefore, the trend term is 
measuring the recent price behaviour, and not the historical volatility. 

The goal of this paper is twofolds. First, we would like to investigate the trend effect in empirical 
data. The main questions are its magnitude, and the time horizons of the return, lagged return 
and realized volatility where the trend effect is important. Second, we want to include the effect 
in ARCH like processes. The addition of such a term in a process is easy, it is enough to add 
one or several r ■ L[r\ terms in the equations. In this context, the point is not to create yet another 
ARCH like process, but to investigate the respective importance of the many ingredients that can 
enter into a volatility process. For example, what are the respective importance of the shape of 
the memory kernel (rectangular, exponential, power law), the short term mean reversion, the trend 
effect, or of the various classes of market participants. The point here is to measure the importance 
of these different stylized facts. 
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The paper is organized essentially along the line of the previous paragraph. First, we describe the 
empirical data, and compute the correlation between the r-L[r\ term and the realized volatility. In 
section 0] we extend several processes with trend term(s), while respecting the basic idea of the 
process (one or several components, power law, etc ...). The processes are compared in sec.|5]with 
respect to their forecasting performances, in light of the properties of the processes. 

In many respects, this contribution is an extension of some results presented in [?]. In particular, it 
will follow the same notation and processes definitions. This paper is self-contained, but in order 
no to repeat extensively this reference, some sections are reduced to the essential. More details 
can be found in [?], for example a discussion of the relative merit of the log-likelihood or of the 
volatility forecast, in relation to a given quadratic process. 



2 The data set 



The data set used for this article is derived from high frequency tick-by-tick data for the foreign 
exchange USD/CHF. Essentially, for the empirical analysis, the data set is a regular time series for 
USD/CHF, sampled every 3 minutes in business time. First, the high frequency data is filtered for 
the incoherent effect [?] : a very short exponential moving average is taken on the prices in order to 
attenuate the tick-by-tick incoherent price formation noise. Second, the price is sampled every 3 
minutes in business time. More precisely, we use the dynamic business time scale as developed in 
[?]. Similarly to the familiar daily business time scale, the dynamic business time scale contracts 
periods of low activity (night, week-end) and expands periods of high activity. The activity pattern 
during the week is related to the measured volatility, averaged on a moving sample of 6 months. 
Holidays and day light saving time are taken into account. The homogeneous time series used for 
the empirical analysis is computed from the high frequency filtered tick-by-tick USD/CHF data, 
that is sampled using a linear interpolation, every 3 minutes in dynamic time scale. As this market 
is open essentially 24 hours per day, 5 days per week, the sampling time interval corresponds to 
an average of 2' 8" (= 5/7 3') during the market opening hours. The author is grateful to Olsen & 
Associates, in Ziirich, Switzerland, to provide the data. 



3 Empirical analysis 

The historical returns r[htr] and lagged historical returns L[5fr;?'[5f^]] are computed by simple price 
difference from the sampled logarithmic prices. The realized volatility is computed with 

[5?^, = 5?a/32] = - £ {t') (1) 

" (+Sr,-<f'<r+8r<, 

and where n is the number of terms in the sum. The reaUzed volatility measures the fluctuation of 
the prices after t, in the interval from ttot + 5fo, The time horizon 5fo, over which the volatility 
is computed, is the main volatility parameter. The returns are taken at time horizons htr = 6fo/32, 
namely they grow with Sfo. Other choices can be made, with a minor influence on the empirical 
results. The sum in eq.Qis computed with all points on the 3' sampling grid. 

With the three time series r, L[r\ and a, we compute the correlation 

p[5f„5f;,5fo] = 9{r[htr] •L[5f,;r[5f;]],a[5fo]) (2) 
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where p{x,y) on the right hand side denotes the usual linear correlation between the time series x 
and y. 




return time interval 

Figure 1: Correlation between the trend term /"•L[r] and the realized volatility. The fixed volatility time 
horizon is 5fo =1h 36. The horizontal axis is the (historical) return time interval Sf^, the vertical axis is 
the (historical) lagged return time interval 5f^. The axis divisions correspond to the logarithmic of the time 
intervals. The main average physical time intervals corresponding to the labels are 1 hour (« ~ 12), 8 hour 
(« ~ 24), 1 day (« = 31), 1 week (« = 40), and 1 month (n = 47). Data courtesy of Olsen and Associates, 
Zurich. 

The analysis of the empirical correlation is difficult to visualize, as it is a function of three time 
intervals. The 2 dimensional results presented below are cut in this 3 dimensional space, showing 
the level of correlation with colors. The figure [2 is a cut at fixed volatility time interval, at the 
shortest volatility estimate 5?o = 32 • 3' = 96' (in business time). The two axes correspond respec- 
tively to the return time interval and lagged return time interval. The main structures emerging 
from this figure are: 

• The correlation is essentially positive. This is consistent with the intuitive explanation given 
in the introduction, as trends will make the market participants to modify their positions, 
increasing the subsequent volatility. Moreover, the level of correlation is ranging from 3 to 
8%, corresponding to an effect of medium importance. 

• We observe 4 pockets with higher correlations, roughly along the diagonal or above. They 
are located approximately at positions 10 to 20, 30, 40 and 50, corresponding respectively 
to time intervals intra-day, 1 day, 1 week and 1 month. These time intervals are indeed 
precisely the expected time horizons for the main groups of traders, in agreement with the 
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finding of [?]. The maxima at 1 month is very well defined, and with 6f,- = 6?'.. On the other 
corner, the intra-day maxima is fairly soft, and clearly above the diagonal. The maxima at 1 
day shows similar characteristics, whereas the maxima at 1 week is along the diagonal and 
weaker. 




return time interval 

Figure 2: Correlation between the trend term r •L[r] and the realized volatility. The fixed volatility time 
horizon is 5fo =1 day. The axes and colors coding are as for fig.E] 

The figure |2l is a cut at fixed volatility time interval, with 8?o = 1 day. For this daily volatility, 
the 4 maxima can be seen, essentially along the diagonal 5tr = bt',.. The intra-day maximum is 
very weak, the daily and monthly maxima are very clear, while the weekly maximum is weaker. 
This shows that the volatility at the daily time horizon is not influenced by intra-day trends, but by 
trends at daily horizon or longer. Both previous figures indicate that the major correlation is along 
the diagonal 5tr = 5f^, or slightly above. 

The figure |3] is a cut along the plane 5?^ = 5f'., and with the volatility time horizon bt^ on the 
vertical axis. The labels on the vertical volatility axis 5tc correspond to the same time intervals 
on the horizontal axis 5f, . The maximum at the daily and monthly time horizons are very clear, 
and the weaker intra-day and weekly maximum can also be seen. This figure shows that trends at 
a given time horizon influence the volatility up to a horizon immediately larger, and then decline. 
For example, a trend measured by consecutive daily return influences strongly the volatility up to 
3 to 4 days, but much less at one week and above. Indeed, this behaviour is what can be expected 
from a portfolio manager working on a daily horizon and adjusting its portfolio based on the price 
trends or drifts of the last few days. 
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return time interval 



Figure 3: Correlation between the trend term r-L[r] and the realized volatility. The horizontal axis is the 
time horizon for both historical returns 5?,- = 5f,'. The volatility time horizon 5fo is given on the vertical axis. 
The axes divisions and colors coding are as for fig.Q] 

4 Modelisation in ARCH-like processes: the ARTCH family 

The influence of trends on the subsequent volatility is fairly easy to incorporate in ARCH-like 
processes. Essentially, we must add one or several terms of the form r-L[r] in the conditional 
volatility. For a given ARCH-like model, there is in general one corresponding minimal model 
incorporating the influence of trends. We call generically this new family of models by 'ARTCH', 
for Auto Regressive Trend Conditional Heteroskedastic. Let us emphasize that our goal lays not in 
writing new ARCH processes, but in quantifying various effects that can be included in a model. 
To address such questions, we build a net of processes of increasing complexity. All the processes 
are then estimated by minimizing the 1 day volatility forecast error on the USD/CHF data. The 
comparison of their optimal forecasting performances measures their relative adequation to the 
data, and the importance of the various effects (at the chosen time horizon). 

Our general strategy is to use quadratic ARCH-like processes to generate volatility forecasts. For 
quadratic processes, conditional averages can be computed analytically, and one obtains volatility 
forecasts that depend on the process parameters. The properties of the forecast derive directly from 
the properties and parameters of the underlying process. This approach gives a simple framework 
in which data generating processes and volatility forecasts are closely related. For forecasting 
purposes only, other direct approaches can be pursued. Poon and Granger [?] wrote an extensive 
review on volatility forecast. 
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Albeit the above strategy is appealing, the difficulty of its practical implementation should not be 
underestimated. For each process, the needed conditional averages must be computed, and imple- 
mented numerically. Moreover, as we are estimating the parameters by minimizing the forecast 
error, the derivatives with respect to the parameters should also be implemented. The theoretical 
setting is following closely [?], and only the salient points are given here for completeness. First, 
we set the basic common equations, and then describe the various ARTCH processes. 



4.1 The basic structure of the processes 

We are considering processes for the price with the following structure 

x{t + 5t) = x{t) + r{t + 5t) (3) 
r{t + 5t) = Geff (f + 5f)e(f + 5f) (4) 
als{t + 5t) = a^[a{t),^,5t]. (5) 

The random variables £ are i.i.d. with £"[£(?)] = and £'[e^(?)]] = 1. The time indexes are chosen 
to emphasize that olff{t + 5t) is a forecast for the volatility at time t + 5t. The forecast function a 
is based on the information set Q.{t) at time t, and depends on a set of parameters 

In the processes below, the right hand side in eq.|5lcontains terms of the form r-L[r\, which have 
no definite sign. As a consequence, the volatility square could become negative. In practice, this 
never occurs at the optimal values of the parameters, but it could happen during the parameters 
estimation. To prevent a square root of a negative value, the right hand side includes a lower value 
threshold. Implicit in all the ARTCH equations, a max function max(agg,a^,-,J is included, where 
olff is given in the equations below. The minimal value for the volatility square a^,„ has value 
=10-1° 



'mm 



This possible negative variance and the related max function are likely to be an artifact of the 
present ARCH setting, which includes only price and volatility. A more complete framework 
should include also the tick rate. It is likely that the "volatility induced by trend" stylized fact is 
related to trading decisions that influence the tick rate and/or the new orders rate, which in turn 
influences the volatility. 



4.2 The GARTCH(1,1) process 

The GARTCH(1,1) process equations are 

o\{t) = ^ a?(f -&) + (!- (6) 
ol^it + ht) = o^ + {\-w^){o\{t)-o^)+Qr[lht]{t)r[lht]{t-lht) (7) 

with the 4 parameters a,Wc»,/^,6, and the integer lag parameters I. For 6 = 0, these equations 
reduce to the usual GARCH(1,1) process. The rational for the equations is the following. The 
"internal" state 0\ measures the volatility at the time horizon x = — 5f/ln(/^), which is the volatility 
computed or perceived by a group of market participants. The equation models their trading 
pattern which depends on the difference with the (expected) mean volatility, and on the recent 
trend/drift of the price. The "parameter" I can be chosen a priori with the guidance of the empirical 
analysis above. It can also be studied systematically as done in sec ]5.3l 
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4.3 The I-GARTCH(l) process 



For Woo = 0, the GARTCH(1,1) equations reduce to the hnear I-GARTCH(l) process 



a^ff(? + 6?) = o\{t)+Qr[l^{t)r[l^{t-lbt) 



(8) 
(9) 



with two parameters n and 6. A slight variation consists in using the definition 



a^ff(? + 5?) =/ia^ff(?) + (1 -^y{t) + Qr[lht\{t) r[m]{t-lht) 



(10) 



In practice, both definitions give very similar results. The empirical results on the 1 day forecast 
accuracy have been computed with the second definition. 

We have also included in the study the RiskMetric volatihty. This process corresponds to the I- 
GARCH process with a fixed parameter /i. As we are working with hourly data (in business time 
scale), we take n = 0.93^/-^'*. 

4.4 The Long Memory volatihty processes 

The long memory process has been introduced in [?]. It incorporates in a minimal way the power 
law decay of the lagged correlation for the absolute value of the return, or of the square return. 
This model is structureless with respect to the time horizons, namely it has a uniform structure 
between a lower and upper cut-off. Therefore, it does not include the specific market components 
as observed and modeled in [?, ?]. In the empirical analysis below, we have used the "microscopic" 
version of the long memory process (see [?]). The inclusion of a trend term in the long memory 
process should preserve this simple and uniform structure. Therefore, the idea is to include one 
r ■ L[r] term for each partial volatility a^, and to have weights given by a simple power law. 

The long memory models are built with a set of (historical) volatilities computed over a set of time 
horizons increasing as a geometric series: 



The time horizons ik correspond to the characteristic times of the EMA at which the historical 
volatihty is measured. The time structure 1^ of the process is a geometric series Ik = p^~^ with 
the progression of the series chosen to be p = 2 in this work. The base time scale Tq corresponds 
to the shortest time scale at which a volatility is measured, and is one of the process parameters. 
The empirical studies in this work have been done with hourly data and with n = 12 components, 
corresponding to an upper cut-off of 6 months. The effective volatility Oeff for the long memory 




(11) 
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(LM) affine (Aff) processes LM-Aff-ARTCH(n), with n components and trends, is 



= t.''k4{t)+^-<^^+L^'^''Mt)r[h]{t-k5t) (12) 
k=l k=l 

Xk = cp-^^ = c(^iy with l/c=tp'^'-'^^ 

Wk = {^-Woc)Xk 

Qk = eop-(^-i)^' = eo(i)' (13) 

The "normaUzation constant" c is chosen so that Y,Xk = 1 and + Woo = 1. The "mean terms" 
introduce two constants a and Wco. For Woo = 0, the linear model is obtained. The "trend terms" 
depend on the two constants 9o and X'. 

In the comparative study, the long memory models allow us to compare the processes with short 
memory (exponential) and long memory (power law). We include 4 versions of the long memory 
process, namely one linear model LM-Lin-AR(T)CH (linear, similar to I-GAR(T)CH, but with 
long memory) and one affine model LM-Aff-AR(T)CH (affine, similar to GAR(T)CH, but with 
long memory). 



4.5 The Market Component volatility processes 



In [?, ?], the correlation between historical and realized volatility is computed for a range of time 
horizons, and a similar computation was done for the change in historical volatility versus the 
reaUzed volatility. These correlations show clearly the characteristic time horizons of the market 
participants, essentially at intra-day, daily, weekly and monthly time intervals. The observed 
heterogeneity of the volatility correlations can be reproduced with a process that incorporates the 
same characteristic time horizons. As noted in the empirical section, the trend effect shows very 
similar characteristic time intervals. In order to include the trend effect in the market components 
process, it is enough to add a trend term for each volatility component. The market component 
ARCH process is presented in detail in [?], we give here for completeness the definition. 

For the market component model, instead of measuring the (historical) volatilities with a simple 

Exponential Moving Average (EMA), which has an exponential kernel, we use an MA operator 
which has a more rectangular-like kernel. The MA operator is defined by [?]: 



2 ni 

MA[T,m;z](0 = -EEMA,-(f) (14) 

EMAi(0 = /iEMAi(?-60 + (l-//)z(0 
EMA/?) = iuEMAj{t-5t) + {l-iu)EMAj_i{t) 
H = exp(— 5?(m + 1)/t) 

The coefficient /u is computed from the time horizon x, so that the memory length of the MA 
operator is x. The parameter m control the shape of the kernel, and we take m = 8 which gives a 
fairly rectangular kernel. 
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The Mkt-Aff-ARTCH process equations, with the trend terms, are: 



alit) = MA[Zk,m;r\t) (15) 
clsit + 50 = + (1 - Woo) £ Xk {olit) - a^) + £ 6^ r[h?>t] {t) r[k^ {t - kSt) (16) 

k=l k=l 

with the constraint 



l,Xk = l. (17) 

k=i 

The hnear version of the process Mkt-Lin-ARTCH is obtained by taking Woo = 0. 



4.6 Other processes 

For completeness, we include in the study a few processes which do not have an obvious extension 
to incorporate the trend effect. The first one is the "permanent forecast". The iiistorical volatihty 
a^gJSfo], with equal weight on a time interval 5f(j, is 

<t[8fa](0 = ^ I r^mit'). (18) 

t-5ta+5tr<t'<t 

As all the points are equally weighted from t — 5ta to t, the memory kernel corresponds to a 
rectangular moving average. The "permanent forecast" uses Ohist for the volatility forecast over 
any time horizon. 

The 1-GARCH(2) process is a natural extension of GARCH(1,1), where the mean volatihty a is 
replaced by an exponential moving average: 

a?(0 = /iia?(?-60 + (l-/ii)r2(0 (19) 
cl{t) = /i2ai(?-60 + (l-/i2)r^(0 
<^lf[{t + ^t) = wG2{tf + {l-w)ai{tf{t). 

This process is linear (in and r^), with two components. We name it I-GARCH(2) as it is a 
natural extension of l-GARCH(l). 



5 Volatility forecast using the processes 
5.1 The data set 

For all the empirical results presented in this section, the data set is a regular time series for 
USD/CHF, sampled hourly in business time. This series is obtained by aggregation by a factor 
28 of the homogeneous time series used in the empirical analysis. The resulting sampUng time 
interval of lh24m corresponds to 7/5 of one hour, and is such that in average, 120 points per week 



10 



are taken. Essentially, during the business week from Monday to Friday, it corresponds to one 
point per hour, and no sampling point is taken during the week-end. Therefore, we will use the 
(imprecise) word "hourly" data when referring to this data set. The data set is computed from 
1.1.1989 to 1.7.2000. The year 1989 is used for the build-up of the processes (i.e. the data for 
1989 are inserted in the volatility processes so that they build their internal states and forget the 
initial conditions, but the cost functions do not include these data). The following 10.5 years of 
data are used for the various studies. 



5.2 Volatility forecast and its measure of quality 

For a quadratic process, the coresponding volatility forecast can be obtained by conditional av- 
erages. Essentially, at time t and with the information set Sl{t), the forecasted volatility at time 
? -I- A? is given by the conditional average 

J [jht-ol^]{t) = E [als{t + At) I (20) 

with At = jdt. In order to compare with the realized volatility, one must compute the mean fore- 
casted volatility between t and t + At 

1 m 

!F [At; olf,] (0 = - I r [j^t; a^^] (t) (21) 

with At = mdt. The processes above are estimated by minimizing the root mean square error 
RMSE[A/, 6] between the forecasted and realized volatility 

RMSE2[Ar,e] = ^ (^^7[A;,e;a2^](0-a,eai[Af](0) (22) 

where is the set of process parameters. One measure of quality used to compare the processes 
below is the relative RMSE, given by 

„,.RMSE = (23, 
stdDev[areaiJ 

with stdDev[.«] the standard deviation of the time series x. The relative RMSE measures the RMSE 
compared to the variance of the realized volatility, with values around when the forecast is 
uncorrelated with the realized volatility, and value 1 for a perfect forecast (i.e. it has a similar 
range as the linear correlation). The other measure of quality used to compare processes is the 
usual linear correlation between the forecasted and realized volatility. 



5.3 Study of the lag term in the I-GARTCH and GARTCH processes 

In the trend term r[ldt] (?) r[ldt] {t — Idt), the size of the lag Idt is effectively a parameter that must 
be fixed. A simple solution is to use the empirical analysis and fix / a priori according to the 
maxima of the correlation values. Another solution for the GARTCH and I-GARTCH model is 
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to do a systematic estimation of the processes for various values of /, and to compare the results. 
This is the aim of fig.|4] Both curves show very clearly 2 minima, located around 3 to 5 hours, and 
24 hours. This is precisely in agreement with the maxima for the correlation given in sec.|3] 

The 2 points located at the left end of both curves (with an abscissa corresponding to / = 0.5) 
correspond to the usual I-GARCH(l) and GARCH(1,1) processes. This figure also shows the 
improvement provided by the trend term, compared to the mean reversion, as mean reversion is 
included in GARCH but not in I-GARCH. Roughly, the improvement provided by the trend term 
is of the order of 2/3 of the mean reversion, namely it is slightly smaller that the mean reversion. 
This shows that, in sample, the trend effect and mean reversion are of comparable magnitude. In 
the comparative study below, we have used the trend term at 1 day. The results using an intra-day 
trend are very similar. 




Figure 4: The RMSE between the 1 day realized volatility and the forecast for the daily volatility according 
to the l-GARTCH(1) and GARTCH(1,1) processes. The horizontal axis give the lag size /, while the cor- 
responding time interval is Idt with 5f = 1 business hour. The 2 points at the extreme left of the curves, 
located at the arbitrary location / — 0.5, correspond to the usual l-GARCH(l) and GARCH(1,1) processes. 
The parameters of both processes are estimated on the data set, for the different value of /, and the RIVISE 
at the minima is reported on the vertical axis. 



5.4 Comparison between processes for the volatility forecast error 

Our goal in this section is to compare systematically the various processes above in term of their 
forecasting performances, in order to measure the importance of the different ingredients included 
into the equations. We proceed in two stages: first an in-sample comparison, which allows for 
a direct measure of the performances of the processes. In a second step, we measure the out-of- 
sample forecast performance with a continuous optimization, namely the parameters are estimated 
over a 5 years moving sample, and the forecast is computed out-of-sample just after the end of the 
estimation data. This corresponds to the best setting that can be used in practice, with a continuous 
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no trend 


with trend 


process name 


rel.RMSE 


corr. 


rel.RMSE 


corr. 


process name 


mean in-sample volatility 


-1.9 


0.0 








permanent fcst(ld) 


-0.8 


46.2 








permanent fcst(lw) 


8.5 


49.2 








permanent fcst(lm) 


8.0 


44.8 








RiskMetrics 


10.5 


47.7 








I-GARCH(l) 


12.2 


51.9 


13.2 


53.2 


I-GARTCH(l) 


GARCH(1,1) 


13.8 


52.6 


14.7 


53.8 


GARTCH(1,1) 


I-GARCH(2) 


14.3 


53.7 








LM-Lin-ARCH(12) 


14.7 


54.0 


15.9 


55.8 


LM-Lin-ARTCH(12) 


LM-Aff-ARCH(12) 


14.7 


54.0 


16.0 


55.8 


LM-Aff-ARTCH(12) 


Mkt-Lin-ARCH(4) 


13.8 


53.8 


14.6 


54.9 


Mkt-Lin-ARTCH(4) 


Mkt-Lin-ARCH(5) 


14.4 


53.9 


15.3 


55.3 


Mkt-Lin-ARTCH(5) 


Mkt-Lin-ARCH(6) 


14.6 


54.0 


15.5 


55.3 


Mkt-Lin-ARTCH(6) 


Mkt-Aff-ARCH(4) 


14.7 


54.0 


15.5 


55.2 


Mkt-Aff-ARTCH(4) 


Mkt-Aff-ARCH(5) 


14.8 


54.1 


15.7 


55.4 


Mkt-Aff-ARTCH(5) 


Mkt-Aff-ARCH(6) 


14.8 


54.1 


15.7 


55.4 


Mkt-Aff-ARTCH(6) 



Table 1: One day volatility forecast for the various processes. The measures of quality of the 
forecast is the relative RMSE, in %, and the linear correlation, in %. The parameters are estimated 
in-sample, on the full 1 1 years sample. The Mkt-*-AR(T)CH processes have 4, 5 or 6 components, 
with time horizons of 4h48, 1 day, 1 week, 1 month; the 5 and 6 components models include a 3 
months component; the 6 component model includes a 1 year component. All multi-components 
models use a microscopic definition for the volatility (see [?] for a comparison with aggregated 
definitions of volatility). 



estimation of the parameters on the most recent 5 years of data. This procedure measures the 
adequation of a process to compute a volatility forecast (as in the in-sample procedure), and at the 
same time the sensitivity of the parameters with respect to the choice of the sub-sample. While 
the full in-sample procedure is simpler to interpret, the second one corresponds to the best setting 
that can be used in practice. 

Taken together, both procedures can be used to build a measure of the parameters robustness with 
respect to the choice of the data sample. We define the measure of robustness with respect to 
changing data sets by 



1/2 



(24) 



where 7 [W,5j;agg] is a one day forecast for the realized variance, with the parameters estimated 
on the subsample from f — 5y to t. The term f [U, 103';agjf] is the forecast with the parameters 
estimated on the full 10 years sample (in-sample forecast). Essentially, the robustness factor Q 
measures the difference in the forecasts when estimating the parameters in a short sample (5 years) 
compared to a long sample (10 years), and with the forecasts computed, respectively, out-of- 
sample and in-sample. For large Q values, the process is more dependent on the choice of the 
sub-samples, and therefore less robust. The robustness factors are given in table|3l 

The results for the in-sample and for the continuous estimation with out-of-sample forecast are 
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no trend 


with trend 


process ndnie 


ici.lvlVloJj/ 


con*. 


rei.ivivioJ-/ 


con*. 


process naiiic 


permanent fcst(5y) 


-5.3 


-2.0 








permanent fcst(ld) 


0.3 


46.9 








permanent fcst(lw) 


7.2 


47.3 








permanent fcst(lm) 


5.5 


40.6 








RiskMetrics 


8.9 


44.7 








I-GARCH(l) 


10.7 


49.9 


11.9 


51.1 


I-GARTCH(l) 


GARCH(1,1) 


12.6 


50.8 


13.4 


51.9 


GARTCH(1,1) 


I-GARCH(2) 


13.5 


52.4 








LM-Lin-ARCH(12) 


14.2 


53.0 


15.7 


55.0 


LM-Lin-ARTCH(12) 


LM-Aff-ARCH(12) 


14.0 


52.8 


15.5 


54.9 


LM-Aff-ARTCH(12) 


Mkt-Lin-ARCH(4) 


13.1 


52.3 


14.1 


53.6 


Mkt-Lin-ARTCH(4) 


Mkt-Lin-ARCH(5) 


13.8 


52.7 


14.8 


54.2 


Mkt-Lin-ARTCH(5) 


Mkt-Lin-ARCH(6) 


14.0 


52.6 


14.9 


54.1 


Mkt-Lin-ARTCH(6) 


Mkt-Aff-ARCH(4) 


13.7 


52.4 


14.8 


53.9 


Mkt-Aff-ARTCH(4) 


Mkt-Aff-ARCH(5) 


13.8 


52.7 


14.8 


54.2 


Mkt-Aff-ARTCH(5) 


Mkt-Aff-ARCH(6) 


13.8 


52.7 


14.7 


54.1 


Mkt-Aff-ARTCH(6) 



Table 2: As for table [l] but with the parameters estimated on a 5 years moving window, and the 
voiatiiity forecast computed out-of-sampie. 



no trend 


with trend 


process name 


Q 


Q 


process name 


permanent fcst(5y) 


0.77 






permanent fcst(ld) 


0.0 






permanent fcst(lw) 


0.0 






permanent fcst(lm) 


0.0 






RiskMetrics 


0.0 






I-GARCH(l) 


0.11 


0.09 


I-GARTCH(l) 


GARCH(1,1) 


0.28 


0.28 


GARTCH(1,1) 


I-GARCH(2) 


0.15 






LM-Lin-ARCH(12) 


0.10 


0.10 


LM-Lin-ARTCH(12) 


LM-Aff-ARCH(12) 


0.14 


0.13 


LM-Aff-ARTCH(12) 


Mkt-Lin-ARCH(4) 


0.07 


0.08 


Mkt-Lin-ARTCH(4) 


Mkt-Lin-ARCH(5) 


0.10 


0.13 


Mkt-Lin-ARTCH(5) 


Mkt-Lin-ARCH(6) 


0.12 


0.19 


Mkt-Lin-ARTCH(6) 


Mkt-Aff-ARCH(4) 


0.20 


0.20 


Mkt-Aff-ARTCH(4) 


Mkt-Aff-ARCH(5) 


0.17 


0.17 


Mkt-Aff-ARTCH(5) 


Mkt-Aff-ARCH(6) 


0.16 


0.20 


Mkt-Aff-ARTCH(6) 



Table 3: The robustness factor Q, in %, for processes without and with trend. Larger values indicate 
a stronger dependency on the sub-sample (i.e. less robust). 
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given respectively in table Q and |2l Overall, the two setups produce consistent results, with the 
same ranking between processes. From both tables, the major results about the structure of the 
processes are as follows. 



• The forecast accuracy improves as the memory kernel of the process approximates better the 
observed power law decay of the lagged correlation. This leads to the improving sequence: 
"permanent forecast" (rectangular memory), RiskMetric and I-GARCH(l) (exponential ker- 
nel), I-GARCH(2) (longer memory, but still exponential), "long memory" processes (power 
law kernel). The shape of the memory kernel appears as the major classifying factor in the 
table. 

• The mean term does not improve much the forecast accuracy and is fragile. This is a sub- 
tle point, which concerns the distinction between linear and affine processes (and between 
I-GARCH(l), GARCH(1,1) and I-GARCH(2)). A mean term in the process equation for 
the effective volatility introduces two constants, namely the mean volatility a and the cor- 
responding coupling constant Woo. These constants set the mean volatility as measured at 
an infinite time horizon, and (likely) lead to well defined asymptotic distributions (see [?] 
for a proof for GARCH(1,1) and [?] for simulations with a long memory process). By con- 
trast, for the linear I-GARCH(l) process, the mean volatility is set by the initial conditions, 
and the asymptotic distribution is singular with all the mass at zero volatility. This singular 
asymptotic property is likely to be true for all linear processes (see the same references), but 
with a time to approach the asymptotic regime given by the longest time horizon in the pro- 
cess. For long term Monte Carlo simulations, to have a well defined asymptotic distribution 
is essential, and therefore affine processes must be used. This property is different from the 
(power law) return to the mean after a large volatility spike, and the inclusion of the "return 
to the mean" in a forecast. For a process with multiple time horizons, the long term compo- 
nents act as a mean for the short time horizons. For example, forecast with an I-GARCH(2) 
process has a reversion toward the mean for time horizons between the 2 characteristic times 
of the EM As. The comparison of the forecast qualities between the I-GARCH(l) process 
(no "mean reversion"), and the GARCH(1,1) and I-GARCH(2) processes (both with "mean 
reversion" at the 1 day time horizon) shows that this "reversion toward the longer term" is 
quantitatively important. Yet, I-GARCH(2) is better than GARCH(1,1), indicating that the 
constant mean volatility term is not the relevant factor. Similarly, the results for the long 
memory processes show no differences between linear and affine versions, as all include 
a similar "mean reversion" at the 1 day forecast horizon (the longest time horizon in the 
process is of 6 months). 

Moreover, the mean volatility is a parameter that is difficult to estimate, in the sense that it 
strongly depends on the chosen estimation sub-sample. This dependency can be seen in the 
measure of the robustness Q, which is systematically larger for all affine processes, roughly 
by a factor ~2 when compared to the corresponding linear processes. Another case where 
the sensitivity of the mean volatility parameter can be observed is for the linear and affine 
long memory processes: their forecast qualities are very similar (even better out-of-sample 
for the linear process). 

To summarize this point, the "reversion toward the mean" must be included in a good fore- 
cast, and it is better done with volatility components at long time horizons. This advocates 
the use of linear processes for volatility forecasts (like I-GARCH(2) or LM-Lin-ARCH pro- 
cesses), and to avoid using affine processes (like GARCH(1,1) process). For Monte Carlo 
simulations at time horizons up to the largest characteristic time horizon included in the 
process (like in scenario simulations, or in risk evaluation), linear processes can be used. 
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The affine processes must be used only for long term Monte Carlo simulations, where the 
well defined asymptotic distribution should be close to the empirical volatility distribution. 

• The trend effect is quantitatively important. For all the processes, the relative RMSE is in 
the range 10% to 14%; the addition of trend terms improves it by 1% to 1.5%. This is com- 
parable to the improvement provided by a power law memory over an exponential memory. 
Moreover, the robustness factors Q are very similar when including trend terms (sometimes 
even smaller despite the larger number of parameters). These forecast quantitative measure- 
ments clearly show that the trend effect is part of the empirical data and that its inclusion in 
a process improves the forecast accuracy. 

• For every process with trends, the trend coefficients are positive. This is in agreement with 
the explanation given in the introduction in terms of the behaviour of the market partici- 
pants. This is also in agreement with the positive correlation between r • L[r] and the realized 
volatility, as analyzed in sec. |3l 

For the long memory processes with trend LM-*-ARTCH, the trend magnitude is 6o ~ 0.18, 
with an exponent X' ~ 1.3. 

• The market models have performances that are similar to the long memory processes. For 
the 4 components model, the number of components and related time horizons are taken 
as in [?]. The models with 5 and 6 components include further time horizons of 3 months 
and 1 year. For these market models, the forecasting performances improve continuously 
with the increasing number of components. This improvement shows the importance of the 
long time horizons, even for a forecast at the comparatively very short horizon of 1 day. Yet, 
whether in-sample or out-of-sample, it is difficult to do systematically better than the simpler 
long memory processes. It indicates that the accurate modeling of the market components 
as measured in [?] is not crucial (for volatility forecast), and that a good caricature of the 
historical-realized volatilities correlations is enough to capture the main time structure of 
the memory. With our setup, simple is better. 

• The use of a coarse grained definition of volatility does not seem to improve the forecast 
accuracy, as studied in details in [?]. 



6 Conclusions 

Our systematic investigation of the empirical correlations between the trend term r-L[r] and the 
realized volatility is in agreement with our intuitive explanation according to the market participant 
behaviors: price trends induce position changes that will create volatility. Indeed, only zero or 
positive correlations are found in empirical data. Moreover, the correlation is large at the typical 
time horizons of the market participants, namely intra-day, 1 day, 1 week (weak correlation), and 
1 month. This decomposition of the price behaviour in term of the time horizons of the market 
participants is fully in line with historical-realized volatility correlations [?,?]. 

The inclusion of trend terms in a process presents no technical difficulties, and improves clearly 
the forecast accuracy. For forecasting purpose, the key factor for a efficient process is to reproduce 
the power law decay of the memory, as is done by the LM-Lin-ARCH process. Then, the trend 
term seems to be the next important factor. For example, the relative RMSE for the processes "per- 
manent forecast" (7.2%), I-GARCH(l) (10.7%), I-GARCH(2) (13.5%), LM-Lin-ARCH (14.2%) 
and LM-Lin-ARTCH (15.7%) show well the successive improvement provided by the memory 
shape converging to an exponential, and finally by the trend term. Notice that, compared to the 
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process name 


rel.RMSE 


corr. 


permanent fcst(ld) 


-17.0 


37.5 


permanent fcst(lw) 


4.1 


38.2 


RiskMetrics 


11.1 


37.0 


I-GARCH(l) 


12.7 


40.9 


GARCH(1,1) 


16.8 


43.0 


I-GARCH(2) 


17.3 


43.6 


LM-Lin-ARCH(12) 


18.0 


44.2 



Table 4: As for tableQ] with in sample optimization, for simulated LM-Aff-ARCH data. 



simpler I-GARCH process (1 parameter), the last process (4 parameters) improves by a good mar- 
gin the forecast accuracy. On the contrary, the inclusion of a mean term in the volatility process 
(affine equation) does not better much the forecast. Yet, the "reversion toward the mean" at the 
time horizon considered (1 day in this work) improves the forecasting performances, but this "re- 
version" is better achieved with a process with multiple time horizons. For example, I-GARCH(2) 
is better than GARCH(1,1) (both with 3 parameters), and LM-Lin-ARCH is even better (with only 
2 parameters). 

At the very end, the quantitative improvements when moving to more sophisticated processes and 
forecasts seems not as important as one could hope it to be. Considering that 20 years separate the 
original GARCH(1,1) process (12.6% relative forecast error) from the LM-Lin-ARTCH (15.7%) 
process, the 3% improvement in the relative forecast error is fairly small. One of the acquired 
knowledge in those 2 decades is the use of high frequency data. This knowledge is already in- 
cluded in the present work, which use deseasonalised hourly data. By using high frequency data, 
we acquire more informations on the recent short term volatility, and subsequently improve short 
term forecasts. Beside that better use of the information present in the data, it seems difficult to 
improve substantially the forecast accuracy. 

In order to measure how good are our different processes in forecasting volatility, we use simulated 
data. We generate the equivalent of 40 years of hourly data, using a LM-Aff-ARCH process. The 
residues are drawn from a Student distribution with 4.5 degree of freedom, in agreement with a log- 
likelihood estimate on the USD/CHF data. Then, we estimate a few processes on this time series, 
and the results for the full in-sample estimates are reported in table |4] In particular, the estimation 
of the LM-ARCH process on the synthetic data has (essentially) no misspecification, which gives 
a kind of absolute bound on the achievable forecasting performance. The conforting news is that 
the relative RMSE are similar to the one obtained with the USD/CHF time series. It indicates that 
the long memory process is good at extracting the available informations to build a forecast, but 
that the (very) fat tail of the price innovation destroys a large part of the forecastability. In this 
sense, the long memory processes are probably pretty close to the best that can be achieved. On 
the other hand, the correlations differ significantly: ~50% with the empirical data versus ~40% 
with the synthetic data. For the time series themselves, the means and standard deviations of the 
volatility are respectively 11.1% and 4.5% for the USD/CHF data, and 10.1% and 3.3% for the 
LM-ARCH data. The large difference for the standard deviations, as well as the differences for 
the correlations of the forecasts, indicate that the LM-ARCH process is not yet a very good model 
for the empirical data. 

To summarize, this paper presents a first investigation of the trend term, with an empirical analysis 
using foreign exchange data, and with a modelisation perspective in an ARCH set-up. It shows 
that the trend effect is quantitatively important. Therefore, it would be interesting to extend sys- 



17 



tematically the present study to data from different asset classes. Another interesting direction 
consists in also including the tick rate. In the intuitive explanation of the trend effect developed 
here, trends trigger trades, that generate in turn volatility. Such effects could be directly measured 
using data from exchange traded instruments, where limit orders and market orders are available. 
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